Search

Embedding Regression: Models for Context-Specific Description and Inference
PEDRO L. RODRIGUEZ, ARTHUR SPIRLING, BRANDON M. STEWART
Journal:

American Political Science Review / Volume 117 / Issue 4 / November 2023

Published online by Cambridge University Press:

19 January 2023, pp. 1255-1274

Print publication:

November 2023
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Social scientists commonly seek to make statements about how word use varies over circumstances—including time, partisan identity, or some other document-level covariate. For example, researchers might wish to know how Republicans and Democrats diverge in their understanding of the term “immigration.” Building on the success of pretrained language models, we introduce the à la carte on text (conText) embedding regression model for this purpose. This fast and simple method produces valid vector representations of how words are used—and thus what words “mean”—in different contexts. We show that it outperforms slower, more complicated alternatives and works well even with very few documents. The model also allows for hypothesis testing and statements about statistical significance. We demonstrate that it can be used for a broad range of important tasks, including understanding US polarization, historical legislative development, and sentiment detection. We provide open-source software for fitting the model.

Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures
Luwei Ying, Jacob M. Montgomery, Brandon M. Stewart
Journal:

Political Analysis / Volume 30 / Issue 4 / October 2022

Published online by Cambridge University Press:

27 September 2021, pp. 570-589
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Topic models, as developed in computer science, are effective tools for exploring and summarizing large document collections. When applied in social science research, however, they are commonly used for measurement, a task that requires careful validation to ensure that the model outputs actually capture the desired concept of interest. In this paper, we review current practices for topic validation in the field and show that extensive model validation is increasingly rare, or at least not systematically reported in papers and appendices. To supplement current practices, we refine an existing crowd-sourcing method by Chang and coauthors for validating topic quality and go on to create new procedures for validating conceptual labels provided by the researcher. We illustrate our method with an analysis of Facebook posts by U.S. Senators and provide software and guidance for researchers wishing to validate their own topic models. While tailored, case-specific validation exercises will always be best, we aim to improve standard practices by providing a general-purpose tool to validate topics as measures.

The Global Diffusion of Law: Transnational Crime and the Case of Human Trafficking
Beth A. Simmons, Paulette Lloyd, Brandon M. Stewart
Journal:

International Organization / Volume 72 / Issue 2 / Spring 2018

Published online by Cambridge University Press:

02 April 2018, pp. 249-281

Print publication:

Spring 2018
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
In the past few decades new laws criminalizing certain transnational activities have proliferated: from money laundering, corruption, and insider trading to trafficking in weapons and drugs. Human trafficking is one example. We argue that criminalization of trafficking in persons has diffused in large part because of the way the issue has been framed: primarily as a problem of organized crime rather than predominantly an egregious human rights abuse. Framing human trafficking as an organized crime practice empowers states to confront cross-border human movements viewed as potentially threatening. We show that the diffusion of criminalization is explained by road networks that reflect potential vulnerabilities to the diversion of transnational crime. We interpret our results as evidence of the importance of context and issue framing, which in turn affects perceptions of vulnerability to neighbors' policy choices. In doing so, we unify diffusion studies of liberalization with the spread of prohibition regimes to explain the globalization of aspects of criminal law.

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts
Part of
- PA editors' choice articles
Justin Grimmer, Brandon M. Stewart
Journal:

Political Analysis / Volume 21 / Issue 3 / Summer 2013

Published online by Cambridge University Press:

04 January 2017, pp. 267-297
- Article
- - You have access
- PDF
- Export citation
Politics and political conflict often occur in the written and spoken word. Scholars have long recognized this, but the massive costs of analyzing even moderately sized collections of texts have hindered their use in political science research. Here lies the promise of automated text analysis: it substantially reduces the costs of analyzing large collections of text. We provide a guide to this exciting new area of research and show how, in many instances, the methods have already obtained part of their promise. But there are pitfalls to using automated methods—they are no substitute for careful thought and close reading and require extensive and problem-specific validation. We survey a wide range of new methods, provide guidance on how to validate the output of the models, and clarify misconceptions and errors in the literature. To conclude, we argue that for automated text methods to become a standard tool for political scientists, methodologists must contribute new methods and new methods of validation.

Computer-Assisted Text Analysis for Comparative Politics
Christopher Lucas, Richard A. Nielsen, Margaret E. Roberts, Brandon M. Stewart, Alex Storer, Dustin Tingley
Journal:

Political Analysis / Volume 23 / Issue 2 / Spring 2015

Published online by Cambridge University Press:

04 January 2017, pp. 254-277
- Article
- - You have access
- PDF
- Export citation
Recent advances in research tools for the systematic analysis of textual data are enabling exciting new research throughout the social sciences. For comparative politics, scholars who are often interested in non-English and possibly multilingual textual datasets, these advances may be difficult to access. This article discusses practical issues that arise in the processing, management, translation, and analysis of textual data with a particular focus on how procedures differ across languages. These procedures are combined in two applied examples of automated text analysis using the recently introduced Structural Topic Model. We also show how the model can be used to analyze data that have been translated into a single language via machine translation tools. All the methods we describe here are implemented in open-source software packages available from the authors.

2 - Navigating the Local Modes of Big Data: The Case of Topic Models
from PART 1 - COMPUTATIONAL SOCIAL SCIENCE TOOLS
- By Margaret E. Roberts, University of California, San Diego, Brandon M. Stewart, Princeton University, Dustin Tingley, Harvard University
Edited by R. Michael Alvarez, California Institute of Technology
Book:

Computational Social Science

Published online:

05 March 2016

Print publication:

07 March 2016, pp 51-97
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

INTRODUCTION
Each day humans generate massive volumes of data in a variety of different forms (Lazer et al., 2009). For example, digitized texts provide a rich source of political content through standard media sources such as newspapers, as well as newer forms of political discourse such as tweets and blog posts. In this chapter we analyze a corpus of 13,246 posts that were written for six political blogs during the course of the 2008 U.S. presidential election. But this is just one small example. An aggregator of nearly every document produced by the U.S. federal government, voxgov.com, has collected more than eight million documents from 2010–2014, including over a million tweets from members of Congress. These data open new possibilities for studies of all aspect of political life from public opinion (Hopkins and King, 2010) to political control (King, Pan, and Roberts, 2013) to political representation (Grimmer, 2013).
The explosion of new sources of political data has been met by the rapid development of new statistical tools for meeting the challenges of analyzing “big data.” (National Research Council, 2013; Grimmer and Stewart, 2013; Fan, Han, and Liu, 2014). A prominent example in the field of text analysis is latent Dirichlet allocation (LDA) (Blei, Ng, and Jordan, 2003; Blei, 2012), a topic model that uses patterns of word co-occurrences to discover latent themes across documents. Topic models can help us deal with the reality that large data sets of text are also typically unstructured. In this chapter we focus on a particular variant of LDA, the structural topic model (STM) (Roberts et al., 2014), which provides a framework to relate the corpus structure we do have (in the form of document-level metadata) with the inferred topical structure of the model.
Techniques for automated text analysis have been thoroughly reviewed elsewhere (Grimmer and Stewart, 2013).We instead focus on a less often discussed feature of topic models and of latent variable models more broadly: multimodality. That is, the models discussed here give rise to optimization problems that are nonconvex. Thus, unlike workhorse tools such as linear regression, the solution we find can be sensitive to our starting values (in technical parlance, the function we are optimizing has multiple modes). We engage directly with this issue of multimodality, helping the reader understand why it arises and what can be done about it.

7 - Combating Transnational Crime
from Part II - Actors
- By Paulette Lloyd, Beth A. Simmons, Brandon M. Stewart
Edited by Michael Zurn, Wissenschaftszentrum Berlin für Sozialforschung, Andre Nollkaemper, Universiteit van Amsterdam, Randy Peerenboom, La Trobe University, Victoria
Book:

Rule of Law Dynamics

Published online:

05 July 2012

Print publication:

18 June 2012, pp 153-180
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

List of Contributors
- By Helmut Philipp Aust, Tim Gemkow, John Gillespie, Linn Hammergren, Monika Heupel, Paulette Lloyd, Wolfgang Merkel, André Nollkaemper, Georg Nolte, Sarah M. H. Nouwen, Randall Peerenboom, Tilmann J. Röder, Frank Schimmelfennig, Gunnar Folke Schuppert, Beth A. Simmons, Brandon M. Stewart, Richard Zajac Sannerholm, Michael Zürn
Edited by Michael Zurn, Wissenschaftszentrum Berlin für Sozialforschung, Andre Nollkaemper, Universiteit van Amsterdam, Randy Peerenboom, La Trobe University, Victoria
Book:

Rule of Law Dynamics

Published online:

05 July 2012

Print publication:

18 June 2012, pp xi-xiv
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Search Results

Refine search

Refine search

Actions for selected content:

9 results

Embedding Regression: Models for Context-Specific Description and Inference

Topics, Concepts, and Measurement: A Crowdsourced Procedure for Validating Topics as Measures

The Global Diffusion of Law: Transnational Crime and the Case of Human Trafficking

Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts

Computer-Assisted Text Analysis for Comparative Politics

2 - Navigating the Local Modes of Big Data: The Case of Topic Models

Summary

7 - Combating Transnational Crime

List of Contributors

Contributors

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

9 results

Summary